Integrating Collocation Features in Chinese Word Sense Disambiguation

نویسندگان

  • Wanyin Li
  • Qin Lu
  • Wenjie Li
چکیده

The selection of features is critical in providing discriminative information for classifiers in Word Sense Disambiguation (WSD). Uninformative features will degrade the performance of classifiers. Based on the strong evidence that an ambiguous word expresses a unique sense in a given collocation, this paper reports our experiments on automatic WSD using collocation as local features based on the corpus extracted from People’s Daily News (PDN) as well as the standard SENSEVAL-3 data set. Using the Naïve Bayes classifier as our core algorithm, we have implemented a classifier using a feature set combining both local collocation features and topical features. The average precision on the PDN corpus has 3.2% improvement compared to 81.5% of the baseline system where collocation features are not considered. For the SENSEVAL-3 data, we have reached the precision rate of 37.6% by integrating collocation features into contextual features, to achieve 37% improvement over 26.7% of precision in the baseline system. Our experiments have shown that collocation features can be used to reduce the size of human tagged corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Large-scale Lexical Semantic Knowledge-base of Chinese

The Semantic Knowledge-base of Contemporary Chinese (SKCC) is a large scale Chinese semantic resource developed by the Institute of Computational Linguistics of Peking University. It provides a large amount of semantic information such as semantic hierarchy and collocation features for 66,539 Chinese words and their English counterparts. Its POS and semantic classification represent the latest ...

متن کامل

The semantic Knowledge-base of Contemporary Chinese and Its Applications in WSD

The Semantic Knowledge-base of Contemporary Chinese (SKCC) is a large scale Chinese semantic resource developed by the Institute of Computational Linguistics of Peking University. It provides a large amount of semantic information such as semantic hierarchy and collocation features for 66,539 Chinese words and their English counterparts. Its POS and semantic classification represent the latest ...

متن کامل

A Word Sense Disambiguation Method Using Bilingual Corpus

This paper proposes a word sense disambiguation (WSD) method using bilingual corpus in English-Chinese machine translation system. A mathematical model is constructed to disambiguate word in terms of context phrasal collocation. A rules learning algorithm is proposed, and an application algorithm of the learned rules is also provided, which can increase the recall ratio. Finally, an analysis is...

متن کامل

Inducing Sense-Discriminating Context Patterns from Sense-Tagged Corpora

Traditionally, context features used in word sense disambiguation are based on collocation statistics and use only minimal syntactic and semantic information. Corpus Pattern Analysis is a technique for producing knowledge-rich context features that capture sense distinctions. It involves (1) identifying sense-carrying context patterns and (2) using the derived context features to discriminate b...

متن کامل

Automatic Detection of Collocation

Collocation is a very important relation between words, which can be widely applied to semantic parsing (e.g., word sense disambiguation), machine translation (e.g., automatic alignment of bilingual corpus), computational lexicon, etc. Firstly, we summarized the methods of likelihood interval, likelihood ratio test, u test and χ test for collocation theoretically, and then utilized them to extr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005